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Abstract — Detection and extraction of objects in images and 
video sequences is an important and intensive activity in the 
researcher’s community. The most important applications 
concern industrial activities, civil and military tasks. This paper 
presents an approach for the detection and the automatic 
extraction of the objects in video sequences captured by a mobile 
camera. The approach is based twice on the optimal orientation 
vision angle and on the camera movement model. Furthermore, 
we have proposed a new algorithm to overcome the drawbacks 
of the Active Edge method which permits us to recover the lost 
points on the object edge. The simulations are operated on some 
standard video sequences and using Matlab software. The 
obtained results show that our approach is very encouraging. 

Index Terms — Object detection, automatic object extraction, 
mobile camera, orientation vision angle, movement model. 

I. INTRODUCTION 

Mobile objects detection and extraction is a fundamental 
aspect in many applications such as robot navigation, video 
surveillance, video indexation, etc. While static object 
detection has reached maturity because a lot of works are 
already done in the literature and many systems are already 
realized, detection and extraction of moving objects stay a 
difficult task and this domain is subject of intensive research 
activities now. Different approaches are proposed in the 
literature to realize this task. Many of these approaches are 
based on pixel classification techniques exploiting a local 
measure linked to apparent movement such as moving image 
difference called Displaced Frame Difference (DFD). Pixel 
classification procedure in static and dynamic zones uses the 
thresholding [l]-[3] or Bayesian techniques [4] -[6]. Some 
approaches operate by iterative processing on pixels or on 
regions [6]. 

Extraction of initial spatial partition is also exploited for the 
segmentation in term of image sequence movement from the 
movement criteria [7], intensity information, texture or color 
[8]-[10]. These methods offer the best precision of the 
movement localization frontiers in terms of intensity, texture 
or color. From this initial segmentation, 2D parametric model 
of movement is associated with each spatial region and the 
segmentation in terms of movement consists to the realization 
of fusion regions. It can exploit the techniques of the 
classification of movement parameter’s space or the Bayesian 
approaches such us the using of Minimum Description Length 
(MDL) [7], or the Markovian techniques of contextual 
labeling on a graph’s regions [9]. One of the limits of these 
approaches is the fact that they cannot exploit the fine spatial 
partition for obtaining the movement parametric model and so 
presents the high probability to lose some movement frontiers 
where many points defining the object edge may be lost. This 
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paper presents an approach for detection and extraction of 
objects in video sequences captured with a mobile camera; 
this approach is based on optimal orientation vision angle and 
camera movement model. The remaining of this paper is 
organized as fellow: section II presents the state of the art; 
section III presents the development of our contribution; 
section IV and section V present the obtained results and 
discussions; finally section VI presents the conclusion and the 
perspectives of this work. 

II. STATE OF THE ART IN OBJECT DETECTION AND 
EXTRACTION 

Detection and localization of objects for extraction in digital 
image and video sequence has become one of the most 
important applications for industrial use to ease user and save 
time. The techniques of detection and extraction of objects 
has been developed many years ago but improvement of 
them, in particular for mobile object, is still required in order 
to achieve the targeted objective in more efficient and 
accurately. Many applications in this domain exist and the 
literature is most abundant. 

In robotic application, the moving object is tracked by 
utilizing a mobile robot with sensors. In [1 1], the authors have 
developed a system where the robotic platform uses a visual 
camera to sense the movement of the desired object and a 
range sensor to help the robot to detect and ovoid obstacles in 
real time while continuing to detect and follow the desired 
object. 

In [12], the authors have developed a method for detection of 
mango from mango tree. Their method uses color processing 
as primary filtering to eliminate the unrelated color or object 
in the image, edge detection and Circular Hough Transform. 
Image and video segmentation and edge detection techniques 
are widely used in object detection, information retrieval by 
several authors such us [13] -[17]. 

In [18], author has developed a perfect method for object 
recognition with full boundary detection. His method is based 
on the combination of Affine Scale Invariant Feature 
Transform (ASIFT) and a region merging algorithm. 

In [19], authors have presented a system for the detection of 
static objects in crowded scenes. In their method, based on the 
detection of two background models learning at different 
rates, pixels are classified with the help of finite-state 
machine. The background is modeled by two mixtures of 
Gaussians with identical parameter except for the learning 
rate. 

In [20], authors have proposed a method for the detection of 
moving object based on the combination of adaptive filtering 
technique and Bayesian change detection algorithm. An 
adaptive structure firstly detects the edges of motion objects; 
then the Bayesian algorithm corrects the shape of detected 
objects. 
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III. PROPOSED OF OBJECT DETECTION AND 
EXTRACTION SYSTEM 

Our contribution in object detection and extraction is divided 
in two stages: optimal orientation vision angle and camera 
motion modeling. 

A. OPTIMAL ORIENTATION VISION ANGLE 

We consider three successive frames in the video sequence 
for the estimation of the orientation vision angle. These 
successive frames are I n . h I n , and I n+1 corresponding to the 
images at times n-I , n and n+1 respectively. Objects of the 
first frame (/„_/) are compared to the objects of the second 
frame (I n ) and the objects of the second frame (7 n ) are 
compared to the objects of the third frame (I n +i). We obtain 
two compensated frames describing the movement which we 
call first order movement. We compared the objects of the 
two compensated frames and the result of this comparison is 
one compensated frame noted A I describing the movement 
which we call second order movement. The second order 
movement is the result of the global movement between the 
first frame (I n .j) and the third frame (I n+ j). The orientation 
vision angle consists to apply to A I a geometric transform for 
the determination of the vision angle. The geometric 
transform is the rotation with angle 0 and 0 is considered 
here as a field vision of the observer. The difficulty is how to 
obtain the optimal vision angle 0 opt . To resolve this 
difficulty, we have developed an algorithm to estimate 0 opt . 
This algorithm is described below: 

1. Divide A I in blocks of nxn pixel with n odd 

2. For each block, apply a rotation with variable angle from 0 

to 360° 

3. For each block, compare the object in A I to object in I n . h I n 

or I n+1 and calculate the difference number using a 

threshold 

4. Optimal vision angle O opt is the angle for wich the similarity 

of the compared objects is maximal 

To evaluate this algorithm, we use the following standard 
video sequences: Tennis and Football video sequences for 
which the frames are captured by a mobile camera. 



b 

Fig. 1 Test video sequences: a) Tennis, b) Foot ball 
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Fig.2 Vision angle analysis 


The above algorithm is applied on the two standard video 
sequences. The statistics which we have obtained have shown 
that 98 % of the blocs were obtained for an average optimal 
vision angle 0 = 60° for the red, green and blue frame 
components as showed in figure 2. Furthermore, figure 2 
shows that the probability to obtain frame blocs for the vision 
angle greater than 120° is equal to zero. Figures 3 and 4 
present respectively the results obtained in term of objects 
detected (frame A/). 



c d 

Fig. 3 Football video sequence: a) frame I n . h b) frame I n , c) frame 
I n+ i, d) frame A / 



c d 

Fig.4 Tennis video sequence: a) frame I n . h b) frame I n , c) frame I n+1 , 
d) frame A / 
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B. CAMERA MOVEMENT MODELING 


The estimation of the camera movement is a difficult task 
because the movement of a pixel between two successive 
images depends not only of the camera parameters but 
depends also of the depth of the captured scene point. The 
camera movement model used in this work is based on the 
model presented in [21]. This model is the affine movement 
and describes the relation between the movement of objects 
and the movement of the observable domains using a 
parametric expression. This model can describe the 
movements such as rotation, translation, and zoom using six 
parameters which are the element of the vector a defined by 
equation 1: 


Q, — , ^2 ’ ^3 » ^4 » ^5 » ^6 ) ( 1 ) 

This movement model is defined in horizontal and vertical 
directions respectively by equations 2 and 3: 
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Where w and h are respectively the width and the height of a 
video frame. 

The coefficients Ci, c 2 and c 3 are defined by equations 4, 5 and 


6 respectively: 


1 

Cl " ' EEh 

(4) 

I 12 


2 y wh (w -l) (w + 1) 

(5) 


3 ]l wh (h-l)(h + l) ^ 

A picture captured by a camera may contain many moving 
objects combining with focal movement of the camera. In this 
work, we have partitioned the video frame at time n in N 
blocks where the size of each block is mxm with m= 16. For 
each block, the movement parameters are estimated. This 
estimation permits the description of the movement inside 
each block between the video frame captured at time n-1 and 
another captured at time n. 

In the camera movement modeling, we have used two steps 
which are: first, the determination of initial motion vector and 
second, the estimation of the motion parameters. 


C. INITIAL MOTION VECTOR 


k+\5l+\5 ( ^ 
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The Lagrangian multiplier 2 is chosen by using [21] with 
2 = 0.852 2 where Q is the discrete cosine transform (DCT) 
quantizer and we have used Q = 7. In equation 8, k i , m) is 

the number of bits necessary to represent the motion vector; in 
this work, we take R[s k i , ra) = 4 bits. 


1) ESTIMATION OF motion PARAMETERS 


The initial motion vector estimated is used for the 
compensation of the current frame (frame at time n) with the 
precedent frame (frame at time n-1). This is done by using 
equation 10: 

s [x, y,t] = s y-mfj-mf ] (10) 

This motion compensation is elaborated for each block of 
16x16 pixels using the minimization criterion given by 
equation 1 1 : 

a R =argmin ^^u 2 \x,y,t,a] (11) 

x,y gA 

Where the vector u is given by equation 12: 

u[x , y, t , a] = s[x, y, t] - s[x - m x \a , x,y\y — m y \a , x, y ], t J (1 2) 

It is necessary to linearize the signal given by equation 13 
around the position (x, y ) considering a small spatial motion 
defined by the following expression [m x [ a , x, y\ m y \a, x, y]) : 


£[x-ra x [a,x,y],y — (13) 

So, the linearization of equation 13 is given by equation (14): 


s[x - m x [ a , x, y\ y~m y \a , x, y\ t \ a 


s[x, y,t\- 


ds[x , y,t] 


c [a,x,y]- 


d six, y,t\ r -I (14) 

, a l 7 7 • / j 4 -m y [a 9 x,y\ 

dx dy 

By introducing equation 14 in equation 12, this last become 
equation 15: 

u[x, y, t,a]& 
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We use now, for the motion model given by equation 2 and 
equation 3, the vector elements given by equation 1 to obtain 
the expression given by equation 1 1 . These vector elements 
must be minimized. Then, this expression to minimize is now 
given by equation 16: 


We define the initial motion vector using m 1 by equation 7: 

m 1 = (m x , m y , mf ) (7) 

Where m x , m y represent the spatial movements and 
mf represents the temporal motion. 

To obtain the initial motion vector, the cost of the estimation 
is given by the Lagrangian defined by equation 8: 

m = arg min h ni)+AR(S k , m)),meM (8) 

Where MSQ(S k>h m) is the distortion for the block S k of size 
16 xl6. It is calculated between two successive frames using 
the least squares method defined by equation 9: 
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Where g x , g y9 x \ and y ’ are given respectively by equations 
17, 18, 19 and 20: 
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We define the spatial gradient as . The expression of 

dz 

this spatial gradient is given by equation 21: 


y,t\ _ 1 

dz 4 


l l 

^j^ a i,A x + «> y + jA+Pi,A x + >’ + jA 

i = 0 y=0 


( 21 ) 


Where afj and pf j represent the elements of the row i and 

the column j of the following matrix A and B defined by 
equations 22 and 23: 
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For the determination of the motion affine parameters, we 
have to determine in the first step the elements a t of vector a 
by using the least square method. We calculate u and we 
define equation 24: 


— = 0 Vie {1,2,..., 6} (24) 

dcij 

The computing of equation 24 allows us to linearize it and 
obtain the matrix defined by equation 25 where the unknown 
is the vector X: 


AX=B (25) 

We use the Gauss method to resolve the equation 25 and we 
obtain X = (a 1} a 2 , a 3 , a 4 , a 5 , a 6 f. 

The motion affine parameters relative to the motion 
compensation between two successive frames are obtained by 
the concatenation of the initial motion vector m 1 and the 
parameters a R estimated. The results are presented in equation 
26: 



Fig. 5 Camera motion detection algorithm 


The figures 6 and 7 show the results obtained for Flowers, and 
Tennis video sequences. 



c 

Fig.6 Detection of camera motion of Flowers video sequence: a) 
frame I n _ h b) frame I n , c) camera motion 
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D. ALGORITHM OF THE DETECTION OF CAMERA 
MOTION 

To detect the movement of the camera, we consider two 
successive frames partitioned in blocks of 16 xl6 pixels. We 
estimate in a first time the initial motion vector by using the 
least square method. In a second time, we operate the 
compensation of the affine parameters by the concatenation of 
the initial motion vector m 1 and the parameters a R . In a third 
time, we estimate the camera motion. Figure 5 shows this 
algorithm. 



Fig.7 Detection of camera motion of Football video sequence: a) 
frame I n _ h b) frame I n , c) camera motion 


IV. Detection Of Motion Objects And Camera Motion 

We apply the algorithm presented if figure 5 for the 
simultaneous detection of the motion objects and the camera 
motion. The results are the intersection between the motion 
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objects and the camera motion. The figures 8 to 13 show the 
obtained results. 



c 

Fig. 8 Objects and camera motion detection of frame number 6 of 
Flowers video sequence: a) objects detection, b) camera motion 
detection, c) simultaneous objects and camera motion detection 



c 

Fig. 9 Objects and camera motion detection of frame number 8 of 
Flowers video sequence: a) objects detection, b) camera motion 
detection, c) simultaneous objects and camera motion detection 



c 

Fig. 10 Objects and camera motion detection of frame number 139 of 
Tennis video sequence: a) objects detection, b) camera motion 
detection, c) simultaneous objects and camera motion detection 




c 


Fig. 1 1 Objects and camera motion detection of frame number 143 of 
Tennis video sequence: a) objects detection, b) camera motion 
detection, c) simultaneous objects and camera motion detection 



c 

Fig. 12 Objects and camera motion detection of frame number 91 of 
Football video sequence: a) objects detection, b) camera motion 
detection, c) simultaneous objects and camera motion detection 



c 

Fig. 13 Objects and camera motion detection of frame number 94 of 
Football video sequence: a) objects detection, b) camera motion 
detection, c) simultaneous objects and camera motion detection 


V. Objects Extraction 

After detecting the camera motion and objects motion, we 
operate the objects extraction in video sequences. For this 
task, we have developed two algorithms, the one for object 
localization and the other for object extraction. 

A. ALGORITHM OF OBJECT LOCALIZATION 

For the localization of the objects, we use the active edge 
method. The algorithm of active method is shown below. 

7. Initialize the edge covering the object to extract 
2. Define the parameters of the elasticity and rigidity of 
the model 


95 


www.erpublication.org 

















Objects Detection and Extraction in Video Sequences Captured by a Mobile Camera 


3. Define the attraction force 

4. Treat the iterations until obtaining convergence 

5. Extract object using object extraction algorithm 

The figure 14 shows the results obtained for Tennis video 
sequence. 




c 

Fig. 14 Objects localization in frame number 139 of Tennis video 
sequence: a) localization of the hand, b) localization of the head, c) 
localization of the body 



Fig. 16 Objects extraction in frame number 39 of Tennis video 
sequence: a) localization of the hand, b) extraction of the hand 



a b 

Fig. 17 Objects extraction in frame number 2 of Football video 
sequence: a) localization of the head, b) extraction of the head 


VI. Conclusion and perspectives 


B. ALGORITHM OF OBJECT EXTRACTION 

The drawback of the method based on the active edge is the 
fact that it cannot allow obtaining all the points belonging to 
the edge. This method produces a curve containing some 
points of the edge but not all its points; so it is difficult to 
correctly extract an object with its active edge and the object 
extracted loses the regularity of its edge. So, we have 
proposed a new interpolation algorithm based on the number 
of row occurrence which allows us to look for the lost points 
(points not obtained by the active edge method) in V where V 
represents the characteristic vector of the active edge. So, this 
algorithm is shown below. 

1. Compare the rows for each couple of successive 

points v(2,/) and v(2,z‘ + l) 

2. Add the lost points 

(x,y),xt] V (2, i),V(2J + \)l y = V (l, () 

3. Calculate a number of the points of a given row 

4. Add the lost points 

5. Extract the object using the updated characteristic 

vector. 


The figures 15 to 17 present the results of extracted objects 
for Tennis and Football video sequences. 



a b 

Fig. 15 Objects extraction in frame number 139 of Tennis video 
sequence: a) localization of the head, b) extraction of the head 


We have in this work presented our contribution in object 
detection and extraction in video sequences. 

In the first time, we have used the pixel differences by 
considering three successive frames. We have defined a 
threshold allowing the localization of the moving areas. The 
obtained compensated frame called AI is analyzed by a 
geometric transformation which simulates the optimal vision 
angle. This analysis allows observing the maximum of 
moving points. 

In the second time, we have operated the detection of the 
camera motion. We have estimated the initial motion vector 
of the affine model of the camera motion; this vector is 
updated by taking in consideration the dynamic of the 
movement and an algorithm is proposed for this purpose. We 
have then evaluated the proposed algorithm on some standard 
video sequences. The results obtained allow observing the 
movement of the objects in the video sequences and the 
movement of the camera. 

In the third time, we have used the active edge method to 
extract the objects. In fact, the active edge method produces a 
curve containing some points of the edge but not all its points 
and it is difficult to correctly extract an object with its active 
edge. To resolve this difficulty, we have proposed a new 
interpolation algorithm based on the number of row 
occurrence which allows us to look for the lost points. 
Applied to some standard video sequences, some objects are 
correctly extracted showing the performance our method. 

In perspectives, we think that it is possible to outperform our 
contribution by using the panoramic model which consists to 
characterize once all static objects in video sequences 
captured with a mobile camera. The static objects being 
already characterized, we can optimally analyze the moving 
objects in the video sequences. 
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